source: https://data.humdata.org/dataset/migrant-deaths-by-month About the Humanitarian Data Exchange The Humanitarian Data Exchange (HDX) is an open platform for sharing data, launched in July 2014. The goal of HDX is to make humanitarian data easy to find and use for analysis.
Migrant Deaths by month “Missing Migrants Project draws on a range of sources to track deaths of migrants along migratory routes across the globe. Data from this project are published in the report “Fatal Journeys: Tracking Lives Lost during Migration,” which provides the most comprehensive global tally of migrant fatalities for 2014, and estimates deaths over the past 15 years.”
Source: International Organization for Migration (IOM) Date of Dataset: Apr 24, 2017 Observations: 2373 obs Expected Update Frequency: Every day Visibility: Public Data Collection Methodology: http://missingmigrants.iom.int/methodology
Refugee is a global issue and it most be dressed from different prospectives to find solutions. The available data should be used more effectively to monitor the current situation . Data science and analyst specialist can contribute and they can use their skills to help decision makes and political parties taking the right decisions and give more efforts.
I choose this dataset to contribute in understanding the refugees problem and find some solutions. May be tracing their death locations, studying which time of the year has the highest rate, and other variables will give an idea for future actions to rescue them or at least prevent this to happen again.
It is good to see some data analyst experts contribute somehow. There is another dataset in Kaggle https://www.kaggle.com/jitender786/world-refugee-count-by-countries dressing similar problem. However, the dataset I used can give more and more details and it also meet the project data criteria.
There are 12 variables.
2 int missings and deaths
2 num geographic coordinate Latitude and Longitude.
The Region Origin: the origion where the migrants came
The incident region: the region where the migrants died or missed
Affected Nationality: the migrants nationality
An important variable is the date of the case report (DD/MM/YYYY format)
Other variables will not be focused on…
Regarding to the Incident Region, there are some abbrivations as following:
## 'data.frame': 2373 obs. of 12 variables:
## $ event_id : int 1 3 4 6 7 8 9 10 11 12 ...
## $ cause_type : Factor w/ 285 levels "","AH1N1 influenza virus, while stuck at border",..: 129 60 129 41 277 41 41 41 41 38 ...
## $ region_origin : Factor w/ 17 levels "","Caribbean",..: 9 4 9 8 15 8 8 8 1 8 ...
## $ affected_nationality : Factor w/ 214 levels "","13 Cuba, 1 Dominican Republic, 1 Colombia",..: 92 85 1 1 28 1 1 1 1 180 ...
## $ affected_missing : int 1 NA NA 6 NA NA NA NA 0 NA ...
## $ affected_dead : int 1 1 1 4 4 11 1 1 3 1 ...
## $ region_incident : Factor w/ 14 levels "","Caribbean",..: 7 3 7 7 12 7 7 7 7 7 ...
## $ date_reported : Factor w/ 814 levels "","01/01/2015",..: 120 71 71 25 25 25 25 25 740 740 ...
## $ meta_source_name : Factor w/ 588 levels "","102Nueve",..: 251 305 204 467 425 204 204 204 32 251 ...
## $ meta_source_reliability: Factor w/ 5 levels "","Partially verified",..: 5 3 5 3 3 5 5 5 3 5 ...
## $ geo_lat : num 36.9 16 36.5 37.3 13.4 ...
## $ geo_lon : num 27.3 -93.7 27.4 27.1 101 ...
As the reviewer 3 recommended, I can check the outliers from such graphs.
Explore more varibales
## MM$affected_nationality
## :1549
## Mexico : 85
## Syria : 82
## Honduras : 73
## Afghanistan: 34
## Guatemala : 33
## (Other) : 517
As required in the second review. some changes have been made above.
I picked just the top 6 affected nationalities from the dataset. Mexico and Sirya are the highest 2 countries. I can understand what is happening in Sirya and the civil war, however, Mexico and Hendouras surprisingly having such high numbers.
## MM$cause_type
## Drowning : 503
## : 202
## Unknown (skeletal remains) : 196
## Presumed drowning : 128
## Sickness_and_lack_of_access_to_medicines: 120
## Vehicle_Accident : 93
## (Other) :1131
Again, I only focused on just the top 5 cause of missing or death among migrants from the dataset.
## [1] 10281
## [1] 10789
## [1] 21070
We can see from the above plot that year 2015 between the months January and July has the highst number of dead migrants. We can look into this time into more details…
It shows that April of 2015 has the highst death rate in migrants. We can look deep and check which nationality affected the most… and where? What are the reason behind this situation all those questions and more can be observed from just looking into this graph some of them will be answered.
It also shows after this accident, the death reports decresed .. may be the affected regions took more resticted rules to avoid this to happen again…
This shows that most of incidents have happened in North Africa. The second highst in in the Mediterranean and it followed by the US/Mexico borders. Some regions are colse by each others then this might take into account while analysis and observations.
This plot shows just the reported cases in MENA region duing the years. It also shows that the second half of the year 2014 very high cases have been reported. and then the number has decreased dramatecely at the fist half of the year after. May be some safety or political actions have been taken after a the big lost in 2014
This plot shows just the reported cases in Mediterrenean region duing the years.
Exploring each variable distribution then both of them together.
## x y
## x 1.00 0.03
## y 0.03 1.00
##
## n= 2373
##
##
## P
## x y
## x 0.0885
## y 0.0885
The relationship between missing and dead values in the data with a scatter plot. The limits have been djusted with log2 transformation.
Review 3… suggestion to change the above box blot to the following
In this boxplot the affected missing vs cause type vs region relationship is shown. Becouse there are more than 200 levels of the couse types, the early 1st boxplot (Commented in the R code) was not clear at all. Thus, I narrow it down to just top 20 causes the above one, Then narrowing down to top 5 as shown in the second box blot. Again Drowing shows the highest cause. However, there is missing empty cause it has high value compairing to others due to missing data which will be cleaned in the final plot.
## rn top5_cause
## 1: Drowning 503
## 2: Unknown (skeletal remains) 196
## 3: Presumed drowning 128
## 4: Sickness_and_lack_of_access_to_medicines 120
## 5: Vehicle_Accident 93
Here is the histogram confirm my findings in previous plots. Drowing is the most common coase of deaths. It shows also the empty valuse which will be removed.
## Mode FALSE TRUE NA's
## logical 2291 82 0
## Length Class Mode
## 136 character character
## Length Class Mode
## 0 character character
Here is just Siryan deaths during the years. It shows that the first half of 2016 has the highest death rats among Syrian. This is becouse the migration crisis and the civil war was having some political nigotiations ….
This is more in deapth about Siryan migrants. It shows the correlations in missings and Deaths in Syrian during years
Scatterplot as the 2nd reviewer suggested.
The number of missing and dead migrants over years.
Compairing the total number of death in both Mediterranean and MEAN regions over years. the time line shows that META in blue has less number than the red Mediterrean total deaths.
There are 5 high peaks in MENA death and the higest is about 500 case at the biggining of last quarter of the year 2014. It is also notable that at the fourth quarter of the year 2016, there are no reported deaths at MENA… we should study what heppened during that period in both incident regions and the origial region of the migrants… may be some potitical issues affected this…
However, the Mediterranean reported some cases all of them below 200 cases. I would like to focouse on this region becouse of the war in Syria and I also would like to focous on Syrian cases.
This is the total (missing and dead) migrants over the years. I labled the migrants who died by drowing over years. It is obvious that it distrubited all over the year.
I labled the migrants who died by drowing over years. It is obvious that it distrubited all over the year.
Here I also applied Log2 transformation
Both missing and dead have been partially verified before reporting. However, there is similiar number also been verified. There is no unverified cases has been reported.
In the next section, I would like to plot Geographic latitude and longitude on a map for more details. the dataset provided 2 variables log and lat which can be used…
I will use this world map base base
The reported deaths in all years in all regions. THe darker the area, the more the number of deaths reported..
it is clear to see that most of the cases are close by the costs (regions of origens). May be more stricted campaings from the region of origins will reduce the cases. Also some other cases are clear it happened between countries borders. THis point can be taken into account.
Missing and Dead migrants in the world.
This map shoes that there are 3 areas in the world which have the highest migrants deaths. One is the Maxican/American borders, the other is Mediterranean and MENA. THe last is in Africa, Howver Mediterranean and MENA area seem to be the higest among the 3 areas. This is clear by the density visualization of the Todal number of Deaths from the dataset.
Zooming in to the above figure, and focusing in Mediterranean and MENA regions ( since they have the most count number), This figure gives more details on the movement of the migrants before any accedent happen.
The above map has the highest rate appears in the red density area. It is the Mediterranean sea. Not surprised, It is the only barrier between wealthy countries with great life (Europe) and poor countries full of war and unemployment. It is also clear that South East African are going toward the North aiming to Europe. The UN and other countries should look into this problem and try to adress the resons behid this move and find solutions.
It is also notiable that Countries in Eest and South Europe affected the most. HOwever, some migraints are aiming the the UK.
## MM$affected_nationality
## :1549
## Mexico : 85
## Syria : 82
## Honduras : 73
## Afghanistan: 34
## Guatemala : 33
## (Other) : 517
This bar chart studies just a part of the affected nationalities. I choosed those countries to be my areas of interests becouse they have the highest deaths rates among all other regions ( shown from the summary).
From the graph, Mexico has the highest affected migrants among all countries 85. This reason mainly bacouse the American Maxican border. This is also shown in above maps. HOwever, Hendours in South America has also high number compairing to the other top 6 countries world wild. The common reason is that both countries are poor with high population and very law employement rate. There resons and may be other reasons are enough to find a better live and then megration.
Siyra is the second highest in the countires list. This is not surprised as the civil war.
I should say that many missing data becouse of the naming of the nationalities. Some time they merge 2 nationalities sometimes they just pick another name (for example : Syrian 16 cases, Syrian Arab Republic some more and Syria, Iraq 4 cases )
## rn top5_cause
## 1: Drowning 503
## 2: Unknown (skeletal remains) 196
## 3: Presumed drowning 128
## 4: Sickness_and_lack_of_access_to_medicines 120
## 5: Vehicle_Accident 93
Legeand has chances as the 2nd review required # Description Two: From the first look into the data and also from the summeries conducted earlier, Drowing seems to be the most commong reason of death among the migrants. This histogram shows the Dowring deaths in pink. Moreover, presured drowing is another reason which can be added to drowing. both causes confirm confirm the observation that drowing is the most common cause. This is also notable from other map plots above. We can also observe that most of migrants use boats or ships for transportation and the autoroties should consider this to avoid or to recuse cases.
I choosed the top 5 reasons based on the summeries before. I cleaned up a bit the NA reasons. However, ther eis a couse named as " Unknown" when a skelaton has been found. May be this is also important to include and we can do further analysis to check the location for all those cases and do further invistigation to avoid it from happening again.
One other important cause is sickness and lack of medication. This is something the world can do something about!!! A solution is to predict the areas ( from this report can we also predict also some further analysis can be done in this), then be ready with paramedic and medications volunteer doctors etc…
Surprisingly, vehicle accident!! I think if I trace its location/reagion it should be in American borders or Turkish- Syrian borders where boats are not possible or more difficult transportation way.
Well, first, I used the data just from MENA region after i subset the main dataset. The main reason for this is to investigate the migrants from Syria becouse of the current civil war. It has been all among media and with this I might get some answers.
I plotted the points and its density (total deaths in MENA region). THe plot has 3 areas of interests. The highest level of density is shown East of Turky where most of Syrian refugees moved. Then they try to arrive to Greece of ther European Countries. The other high reagin is on the Turkish Syrian borders. This confirms the 1st observation where Syrians escape to Turky looking for a better place then after settle down they try to move illegally to Europe. I did a quick investagation and I found during this time, the European migrant crisis has began. Souce: https://en.wikipedia.org/wiki/European_migrant_crisis
THe above article answers many questions come up to my mind while dealing with this dataset. I will touch this on the reflection .
Well, I used the dateset to investigate the movement of the migrants not only in MENA and Mediterranean region, but also the world. Some statistics has been done to check which region has the higest number of deaths. I found out that MENA including North Africa and Mediterranean regions contian the hgihest death rate world wild.
There are dirrefent reasons behind this hgih rate in those regions. The Wikipedia website above gives an overview about the sitoation in Europe. Also the civil war in Syria and in Sudan and other resons related to the hard life in Africa play important role to this to be happened.
However, I found some diffeculities to deal with the dataset. One of them is the Catigorical date. Not many numeric or integer data provided. Also it would be nice to include the gender of the reported case and whether or not has children… These details will answer more questions and might help to prevent or reduce this from happeneing in future.
I also got surprise of the number or migrent deaths between the US and Mexico borders. That is a big number comparing to the WAR and hunger in MENA and Mediterrnean regions.
I did not touch the meta source name and relability. May be we could infistegate more about this in particular to verify the data for better use. Also I did not check deaply into the particular months and the weather conditions in each month. This of course decreas or increase the eaths and missing cases as most of them died by drowing.
At the End, I think with Data Analysis we can definitely help to prevent such things to be happened. However, political actions most be taken to help those human and save thier lives.